Acoustic Fingerprint
   HOME

TheInfoList



OR:

An acoustic fingerprint is a condensed digital summary, a
fingerprint A fingerprint is an impression left by the friction ridges of a human finger. The recovery of partial fingerprints from a crime scene is an important method of forensic science. Moisture and grease on a finger result in fingerprints on surfac ...
, deterministically generated from an
audio signal An audio signal is a representation of sound, typically using either a changing level of electrical voltage for analog signals, or a series of binary numbers for digital signals. Audio signals have frequencies in the audio frequency range of r ...
, that can be used to identify an
audio sample In sound and music, sampling is the reuse of a portion (or sample) of a sound recording in another recording. Samples may comprise elements such as rhythm, melody, speech, sounds or entire bars of music, and may be layered, equalized, sped up or ...
or quickly locate similar items in an audio database. Practical uses of acoustic fingerprinting include identifying
song A song is a musical composition intended to be performed by the human voice. This is often done at distinct and fixed pitches (melodies) using patterns of sound and silence. Songs contain various forms, such as those including the repetitio ...
s,
melodies A melody (from Greek language, Greek μελῳδία, ''melōidía'', "singing, chanting"), also tune, voice or line, is a Linearity#Music, linear succession of musical tones that the listener perceives as a single entity. In its most liter ...
, tunes, or
advertisements Advertising is the practice and techniques employed to bring attention to a product or service. Advertising aims to put a product or service in the spotlight in hopes of drawing it attention from consumers. It is typically used to promote a ...
;
sound effect A sound effect (or audio effect) is an artificially created or enhanced sound, or sound process used to emphasize artistic or other content of films, television shows, live performance, animation, video games, music, or other media. Traditi ...
library management; and
video file A video file format is a type of file format for storing digital video data on a computer system. Video is almost always stored using lossy compression to reduce the file size. A video file normally consists of a container (e.g. in the Matroska ...
identification. Media identification using acoustic fingerprints can be used to monitor the use of specific musical works and performances on
radio broadcast Radio broadcasting is transmission of audio (sound), sometimes with related metadata, by radio waves to radio receivers belonging to a public audience. In terrestrial radio broadcasting the radio waves are broadcast by a land-based radio sta ...
, records, CDs,
streaming media Streaming media is multimedia that is delivered and consumed in a continuous manner from a source, with little or no intermediate storage in network elements. ''Streaming'' refers to the delivery method of content, rather than the content it ...
and
peer-to-peer Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network. They are said to form a peer-to-peer n ...
networks. This identification has been used in copyright compliance, licensing, and other
monetization Monetization ( also spelled monetisation) is, broadly speaking, the process of converting something into money. The term has a broad range of uses. In banking, the term refers to the process of converting or establishing something into legal tend ...
schemes.


Attributes

A robust acoustic fingerprint algorithm must take into account the perceptual characteristics of the audio. If two files sound alike to the human ear, their acoustic fingerprints should match, even if their binary representations are quite different. Acoustic fingerprints are not
hash function A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called ''hash values'', ''hash codes'', ''digests'', or simply ''hashes''. The values are usually u ...
s, which must be sensitive to any small changes in the data. Acoustic fingerprints are more analogous to human fingerprints where small variations that are insignificant to the features the fingerprint uses are tolerated. One can imagine the case of a smeared human fingerprint impression which can accurately be matched to another fingerprint sample in a reference database; acoustic fingerprints work in a similar way. Perceptual characteristics often exploited by audio fingerprints include average
zero crossing A zero-crossing is a point where the sign of a mathematical function changes (e.g. from positive to negative), represented by an intercept of the axis (zero value) in the graph of the function. It is a commonly used term in electronics, mathemat ...
rate, estimated
tempo In musical terminology, tempo (Italian, 'time'; plural ''tempos'', or ''tempi'' from the Italian plural) is the speed or pace of a given piece. In classical music, tempo is typically indicated with an instruction at the start of a piece (often ...
, average
spectrum A spectrum (plural ''spectra'' or ''spectrums'') is a condition that is not limited to a specific set of values but can vary, without gaps, across a continuum. The word was first used scientifically in optics to describe the rainbow of colors i ...
, spectral flatness, prominent tones across a set of
frequency band A frequency band is an interval in the frequency domain, delimited by a lower frequency and an upper frequency. The term may refer to a radio band or an interval of some other spectrum. The frequency range of a system is the range over which i ...
s, and
bandwidth Bandwidth commonly refers to: * Bandwidth (signal processing) or ''analog bandwidth'', ''frequency bandwidth'', or ''radio bandwidth'', a measure of the width of a frequency range * Bandwidth (computing), the rate of data transfer, bit rate or thr ...
. Most audio compression techniques will make radical changes to the binary encoding of an audio file, without radically affecting the way it is perceived by the human ear. A robust acoustic fingerprint will allow a recording to be identified after it has gone through such compression, even if the audio quality has been reduced significantly. For use in
radio broadcast Radio broadcasting is transmission of audio (sound), sometimes with related metadata, by radio waves to radio receivers belonging to a public audience. In terrestrial radio broadcasting the radio waves are broadcast by a land-based radio sta ...
monitoring, acoustic fingerprints should also be insensitive to analog
transmission Transmission may refer to: Medicine, science and technology * Power transmission ** Electric power transmission ** Propulsion transmission, technology allowing controlled application of power *** Automatic transmission *** Manual transmission *** ...
artifacts.


Spectrogram

Generating a signature from the audio is essential for searching by sound. One common technique is creating a time-frequency graph called
spectrogram A spectrogram is a visual representation of the spectrum of frequencies of a signal as it varies with time. When applied to an audio signal, spectrograms are sometimes called sonographs, voiceprints, or voicegrams. When the data are represen ...
. Any piece of audio can be translated to a spectrogram. Each piece of audio is split into some segments over time. In some cases adjacent segments share a common time boundary, in other cases adjacent segments might overlap. The result is a graph that plots three dimensions of audio: frequency vs amplitude (intensity) vs time.


Shazam

Shazam's algorithm picks out points where there are peaks in the spectrogram which represent higher energy content. Focusing on peaks in the audio greatly reduces the impact that
background noise Background noise or ambient noise is any sound other than the sound being monitored (primary sound). Background noise is a form of noise pollution or interference. Background noise is an important concept in setting noise levels. Background no ...
has on audio identification. Shazam builds their fingerprint catalog out as a
hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', als ...
, where the key is the frequency. They do not just mark a single point in the spectrogram, rather they mark a pair of points: the ''peak intensity'' plus a second ''anchor point''. So their database key is not just a single frequency, it is a hash of the frequencies of both points. This leads to fewer
hash collision In computer science, a hash collision or hash clash is when two pieces of data in a hash table share the same hash value. The hash value in this case is derived from a hash function which takes a data input and returns a fixed length of bits. Al ...
s improving the performance of the hash table.


See also

* Chromaprint *
Automatic content recognition Automatic content recognition (ACR) is a technology to identify content played on a media device or present within a media file. Devices implementing ACR can allow the device or the manufacturer to collect content consumption information automatic ...
*
Digital video fingerprinting Video fingerprinting or video hashing are a class of dimension reduction techniques in which a system identifies, extracts, and then summarizes characteristic components of a video as a unique or a set of multiple perceptual hashes, enabling tha ...
*
Feature extraction In machine learning, pattern recognition, and image processing, feature extraction starts from an initial set of measured data and builds derived values (features) intended to be informative and non-redundant, facilitating the subsequent learning a ...
*
Parsons code The Parsons code, formally named the Parsons code for melodic contours, is a simple notation used to identify a piece of music through melodic motion — movements of the pitch up and down. Denys Parsons developed this system for his 1975 book ' ...
*
Perceptual hashing Perceptual hashing is the use of a fingerprinting algorithm that produces a snippet, hash, or fingerprint of various forms of multimedia. A perceptual hash is a type of locality-sensitive hash, which is analogous if features of the multimedia ar ...
*
Search by sound Search by sound is the retrieval of information based on audio input. There are a handful of applications, specifically for mobile devices that utilize search by sound. Shazam (service), Soundhound (previously Midomi), Axwave, ACRCloud and other ...
*
Sound recognition Sound recognition is a technology, which is based on both traditional pattern recognition theories and audio signal analysis methods. Sound recognition technologies contain preliminary data processing, feature extraction and classification algori ...


References

{{reflist


External links


A Review of Algorithms for Audio Fingerprinting (P. Cano et al. In International Workshop on Multimedia Signal Processing, US Virgin Islands, December 2002)

Content-Based Retrieval of Music and Audio by Jonathan Foote, ISS, National University of Singapore.
Fingerprinting algorithms ca:Empremta digital multimèdia